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University 

We consider an empirical likelihood inference for parameters de- 
fined by general estimating equations when some components of the 
random observations are subject to missingness. As the nature of the 
estimating equations is wide-ranging, we propose a nonparametric 
imputation of the missing values from a kernel estimator of the con- 
ditional distribution of the missing variable given the always observ- 
able variable. The empirical likelihood is used to construct a profile 
likelihood for the parameter of interest. We demonstrate that the 
proposed nonparametric imputation can remove the selection bias in 
the missingness and the empirical likelihood leads to more efficient 
parameter estimation. The proposed method is further evaluated by 
simulation and an empirical study on a genetic dataset on recombi- 
nant inbred mice. 

1. Introduction. Missing data are encountered in many statistical ap- 
plications. A major undertaking in biological research is to integrate data 
generated by different experiments and technologies. Examples include the 
effort by genenetwork.org and other data depositories to combine genetics, 
microarray data and phenotypes in the study of recombinant inbred mouse 
lines [34]. One problem in using measurements from multiple experiments 
is that different research projects choose to perform experiments on differ- 
ent subsets of mouse strains. As a result, only a portion of the strains have 
all the measurements, while other strains have missing measurements. The 
current practice of using only those complete measurements and ignoring 
incomplete measurements with missing values is undesirable since the se- 
lection bias in the missingness can cause the parameter estimators to be 
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inconsistent. Even in the absence of the selection bias (missing completely 
at random), the complete measurements-based inference is generally not ef- 
ficient as it throws away data with missing values. Substantial research has 
been done to deal with missing data problems; see [17] for a comprehen- 
sive overview. Inference based on estimating equations [3, 9] is a general 
framework for statistical inference, accommodating a wide range of data 
structure and parameters. It has been used extensively for conducting semi- 
parametric inference in the context of missing values. Robins, Rotnitzky 
and Zhao [25, 26] proposed using the parametrically estimated propensity 
scores to weigh estimating equations that define a regression parameter, and 
Robins and Rotnitzky [24] established the semiparametric efficiency bound 
for parameter estimation. The approach based on the general estimating 
equations has the advantage of being more robust against model misspec- 
ification, although a correct model for the conditional distribution of the 
missing variable given the observed variable is needed to attain the semi- 
parametric efficiency. See [32] for a comprehensive review. 

In this paper we consider an empirical likelihood based inference for pa- 
rameters defined by general estimating equations in the presence of missing 
values. Empirical likelihood introduced by Owen [19, 20] is a computer- 
intensive statistical method that facilitates a likelihood-type inference in 
a nonparametric or semiparametric setting. It is closely connected to the 
bootstrap as the empirical likelihood effectively carries out the resampling 
implicitly. On certain aspects of inference, empirical likelihood is more at- 
tractive than the bootstrap, for instance its ability of internal studentizing 
so as to avoid explicit variance estimation and producing confidence regions 
with natural shape and orientation; see [21] for an overview. In an impor- 
tant development, Qin and Lawless [23] proposed an empirical likelihood 
for parameters defined by general estimating equations and established the 
Wilks theorem for the empirical likelihood ratio. Chen and Cui [5] show that 
the empirical likelihood of [23] is Bartlett correctable, indicating that the 
empirical likelihood has this delicate second-order property of the conven- 
tional likelihood more generally than previously anticipated. In the context 
of missing responses, Wang and Rao [33] studied empirical likelihood for 
the mean with imputed missing values from a kernel estimator of the con- 
ditional mean, and demonstrated that some of the attractive features of the 
empirical likelihood continue to hold. 

When the parameter of interest defined by the general estimating equa- 
tions is not directly related to a mean, or a regression model is not assumed 
as the model structure, the commonly used conditional mean-based impu- 
tation via either a parametric [36] or nonparametric [7] regression estimator 
may result in either biased estimation or reduced efficiency; for instance, 
when the parameter of interest is a quantile (conditional or unconditional) 
or some covariates are subject to missingness. To suit the general nature 
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of parameters defined by general estimating equations and to facilitate a 
nonparametric likelihood inference in the presence of missing values, we 
propose a nonparametric imputation procedure that imputes missing val- 
ues repeatedly from a kernel estimator of the conditional distribution of the 
missing variables given the fully observable variables. To control the variance 
of the estimating functions with imputed values, the estimating functions 
are averaged based on multiply-imputed values. We show that the maxi- 
mum empirical likelihood estimator based on the nonparametric imputation 
is consistent and is more efficient than the estimator that ignores missing 
values. In particular, when the number of the estimating equations is the 
same as the dimension of the parameter, the proposed empirical likelihood 
estimator attains the semiparametric efficiency bound. 

The paper is structured as follows. The proposed nonparametric impu- 
tation method is described in Section 2. The formulation of the empirical 
likelihood is outlined in Section 3. Section 4 gives theoretical results of the 
proposed nonparametric imputation-based empirical likelihood estimator. 
Results from simulation studies are reported in Section 5. Section 6 ana- 
lyzes a genetic dataset on recombinant inbred mice. All technical details are 
provided in the Appendix. 

2. Nonparametric imputation. Let Zj = {XJ , Y[) T , % = 1, . . . ,n, be a set 

of independent and identically distributed random vectors, where AYs are 
<i x -dimensional and are always observable, and l^'s are d y -dimensional and 
are subject to missingness. In practice, the missing components may vary 
among incomplete observations. For ease of presentation, we assume the 
missing components occupy the same components of Z^. Extensions to the 
general case can be readily made. Furthermore, our use of Y{ for the miss- 
ing variable does not prevent it being either a response or covariates in a 
regression setting. 

Let be a p-dimensional parameter so that E{g(Zi,9)} = at a unique 
9q, which is the true parameter value. Here g(Z,6) = (gi(Z,6), . . . ,g r (Z,6)) T 
represents r estimating functions for an integer r >p. The interest of this 
paper is in the inference on 6 when some Y^s are missing. 

Define Si = 1 if Yi is observed and Si = if Yi is missing. Like in [7, 33] 
and others, we assume that 5 and Y are conditionally independent given X, 
namely the strongly ignorable missing at random proposed by Rosenbaum 
and Rubin [27]. As a result, 

P(S = 1\ Y, X) =P(S= 1 | X) =: p(X), 

where p{x) is the propensity score and prescribes a pattern of selection bias 
in the missingness. 
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Let F(y\Xi) be the conditional distribution of Y given X = Xi, and W(-) 
be a d x -dimensional kernel function of the qth order satisfying 

J W{s u ...,s dx )dsi--- ds dx = 1, 

si, . . . , Sd x ) ds\ ■ ■ ■ dsd x = for any i = 1, . . . , d x and 1 < I < q 



Js[W( 



and / sjW(s\, . . . ,Sd x ) ds\ ■ ■ ■ dsd x / 0. A kernel estimator of F(y\Xi) based 
on the sample is 

Here h is a smoothing bandwidth and !(•) is the d y -dimensional indica- 
tor function, which is defined as I(Yl <y) = l if all components of Y{ are 
less than or equal to the corresponding components of y, respectively, and 
< y) = otherwise. The property of the kernel estimator when there 
are no missing values is well understood in the literature, for instance in 
[12]. Its properties in the context of the missing values can be established 
in a standard fashion. An important property that mirrors one for uncondi- 
tional multivariate distribution estimators given in [15] is that the efficiency 
of F(y\Xi) is not influenced by the dimension of Yi. Here we concentrate 
on the case that Xi is a continuous random vector. Extension to discrete 
random variables can be readily made; see Section 5 for an implementation 
with binary random variables. 

We propose to impute a missing Yi with a Yi, which is randomly generated 
from the estimated conditional distribution F{y\Xi). Effectively Yi has a 
discrete distribution where the probability of selecting a Y\ with Si = 1 is 

(2) wm-x^/h] 



Z^SjWKXj-xj/hy 

To control the variability of the estimating functions with imputed values, 
we make k independent imputations {Yi u }* =1 from F{y\Xi) and use 

K 

(3) = 5 i9 (X u Y u 6) + (1 - 5i)K~ l Y, 9{Xi,Y iv , 6) 

v=\ 

as the estimating function for the ith observation. Like the conventional 
multiple- imputation procedure of Rubin [28], to attain the best efficiency, 
k is required to converge to oo. Our numerical experience indicates that 
setting k = 20 worked quite well in our simulation experiments reported in 
Section 5. A theoretical justification for this choice can be drawn from a 
remark to Theorem 2 in the next section. 
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The way missing values are imputed depends critically on the nature of 
the parameter 9 and the underlying statistical model. A popular imputation 
method is to impute a missing Yi by the conditional mean of Y given X = Xi 
as proposed in [36] under a parametric regression model and in [7] and [33] 
via the kernel estimator for the conditional mean. However, this conditional 
mean imputation may not work for a general parameter and a general model 
structure other than the regression model; for instance, when the parame- 
ter is a correlation coefficient or a conditional or unconditional quantile [1] 
where the estimating equation is based on a kernel smoothed distribution 
function. Nor is it generally applicable to missing covariates in a regression 
context. In contrast, the proposed nonparametric imputation is applicable 
for any parameter defined by estimating equations, and the way we impute 
the missing values is independent of the estimating equations. The latter is 
specially attractive as this separation of the imputation and the inference 
steps is considered a major advantage of the multiple imputation approach 
proposed by Rubin [28]. 

It should be noted that, when k — > oo, the proposed method is equivalent 
to imputing the estimating functions with missing Y{S using the Nadaraya- 
Watson estimator of E{g(X{, Yi, 9)\Xi}, 



The imputation of the estimating function has the imputation and inference 
steps intertwined together except in some special cases, for instance when 
9 is the mean of Yi as considered in Cheng [7] and Wang and Rao [33] . In 
that case, g(Z, 9) = (Y — 9) and rh g {Xi,9) is a simple difference between the 
kernel estimator of E(Y\Xi) and 9, which effectively separates the imputa- 
tion and the inference step. However, for a general estimation equation, the 
imputation and inference steps may not be separable. This means that, as 
the search for an estimator of 9 is made through the parameter space, the 
imputation has to be repeated whenever there is a change in the 9 value. 
This computational burden would be particularly severe for the empirical 
likelihood, and more so when a resampling-based approach, for instance the 
bootstrap, is used to profile the empirical likelihood ratio. In contrast, the 
proposed approach generates a fixed set of missing values. Once they are 
generated, the same algorithm for data without missing values can be used 
without reimputation. 

The curse of dimension is an issue with the kernel estimator F(y\Xi). 
However, as demonstrated in Section 4, since the target of the inference is 
a finite-dimensional 9 rather than the conditional distribution F(y\Xi), the 
curse of dimension does not pose any leading order effects on ^-estimation as 
long as the bias of the kernel estimator is controlled. When d x > 4, control- 
ling the bias requires the order of the kernel q > 2, the so-called high-order 





G 



D. WANG AND S. X. CHEN 



kernel, so that ^pnh q — > instead of y^nh 2 — > when a conventional sec- 
ond order kernel is used. Using a high-order kernel may occasionally cause 
F(y\Xj) not being a proper conditional distribution as the imputation prob- 
ability weights in (2) may be negative in the tails. However, the occurrence 
of this phenomenon is rare for large sample sizes as F{y\x) is a consistent 
estimator of F{y\x). In practice, we can readjust the probability weights in 
(2) by setting negative weights to zero and rescaling the remaining weights 
to assure that all weights sum up to one. This readjustment is similar to 
the method used by Hall and Murison [10] for high-order kernel density 
estimators. 

3. Empirical likelihood. The nonparametric imputation produces an ex- 
tended sample including (Xi,Yi) T for each 8{ = 1, and (JQ, {Yj„}" =1 ) T for 
each 5i = 0. With the imputed estimating functions gi(9), the usual estimat- 
ing equation approach can be used to make inference on 8. The variance 
of the general estimating equation-based estimator for 9 can be estimated 
using a sandwich estimator and the confidence regions can be obtained by 
asymptotic normal approximation. In this article, we would like to carry 
out a likelihood type inference using empirical likelihood, encouraged by its 
attractive performance for estimating equations without missing values, as 
demonstrated by Qin and Lawless [23] and the work of Wang and Rao [33] 
for inference on a mean with missing responses. An advantage of empirical 
likelihood is that it has no predetermined shape of the confidence region; 
instead, it produces regions that reflect the features of the data set. Our pro- 
posal of using empirical likelihood in conjunction with the nonparametric 
imputation is especially attractive, since it requires only weak assumptions 
for both imputation and inference procedures while it also has the flexibility 
inherent to empirical likelihood and the estimating equations. 

Let pi represent the probability weight allocated to (ji(0). The empirical 
likelihood for 6 based on giiO) is 

{n n n \ 

i=l i=l i=l J 

By the standard derivation of empirical likelihood [23], the optimal pi is 

1 1 

where t{9) is the Lagrange multiplier that satisfies 

< 5 > «^>4?TTJfbr°- 
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Let £ n (9) = — log{L n (9)/n n } be the log empirical likelihood ratio. The 
maximum empirical likelihood estimator (MELE), B n , can be derived by 
maximizing L n {9) or minimizing £ n (9). 

When the estimating function g(Z, 9) is differentiable with respect to 9, 
the MELE can be found via solving the following system of equations [23], 



Like the conventional parametric maximum likelihood estimation (MLE), 
there may be multiple solutions to the likelihood equation (6) depending on 
the form of g(Z,9) and the underlying distribution. It is required that each 
solution be substituted back to L n {9) to identify the MELE. 

4. Main results. In this section, we first present a theorem regarding the 
consistency of 9 n , which is a solution of the likelihood equation (6). We then 
discuss the estimation efficiency and propose confidence regions for 9 based 
on the empirical likelihood ratio. We use 9q to denote the true parameter 
value. 

Theorem 1. Under the conditions given in the Appendix, as n — > oo 
and k — > oo, with probability tending to 1 the likelihood equation (6) has a 
solution 9 n within the open ball \\9 — 9q\\ < Cn -1 ^ for a positive constant. 

The theorem indicates consistency of 9 n . The nature of the result cor- 
responds to Lemma 1 of Qin and Lawless [23] on the consistency of the 
maximum empirical likelihood estimator without missing values and is an 
analogue of Cramer [8] for parametric MLEs. 

Next we consider the efficiency of 9 n . Write g(Z) =: g(Z,9o). We define 



Theorem 2. Under the conditions given in the Appendix, as n — > oo 
and k — ► oo, 




Qm{9,t) = and Q n2 (e,t)=0, 




(7) T = E[p- 1 (X)Y^{g(Z)\X}+E{g(Z)\X}E{g T (Z)\X}} 

(8) f = E[p(X) Vav{g(Z)\X} + E{g(Z)\X}E{g T (Z)\X}} 
and V = {E(%yt~ 1 E{%)r 1 a t9 = 9 Q . 



with E = VE{% ) T t- l Tt- l E(% )V. 
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The estimator 9 n is consistent and asymptotically normal for 9$ and the 
potential selection bias in the missingness as measured by the propensity 
score p(x) has been filtered out. If there are no missing values, r = T = 
E(gg T ), which means that 

M<s)W)-^(g)r. 

This is the asymptotic variance of the maximum empirical likelihood estima- 
tor based on full observations given in [23] . Comparing the forms of £ with 
and without missing values shows that the efficiency of the maximum em- 
pirical likelihood estimator based on the proposed imputation will be close 
to that based on full observations if either the proportion of missing data 
is low, or if E{p- 1 (X)Vax(g\X)} is small relative to E{E(g\X)E(g T \X)}, 
namely when X is highly "correlated" with Y. 

In the case of 9 = EY , S = E{a 2 (X) /p(X)} + Var{m(X)}, where a 2 (X) = 
V&r(Y\X) and m(X) = E(Y\X). Thus, in this case 9 n is asymptotically 
equivalent to the estimator proposed by Cheng [7] and Wang and Rao [33] 
based on the conditional mean imputation. 

When r = p, namely the number of estimating equations is the same as 
the dimension of 9, 



which is the semiparametric efficiency bound for the estimation of 9 as given 
by Chen, Hong and Tarozzi [6]. 

Like the multiple imputation of Rubin [28] , our method requires k — > oo . 
To appreciate this proposal, we note that when k is fixed, the T and T 
matrices used to define S are 

T = E[{p-\X) + k-\1 -p(X))} Var( 5 LY) + E{g\X)E(g T \X)\ 

and 

f = E[{p(X) + k-\1 - p{X))}\ W {g\X) + E(g\X)E(g T \X)}. 

Hence, a larger k will reduce the terms in T and T, which are due to a single 
nonparametric imputation. Our numerical experience suggests that k = 20 is 
sufficient for most situations and would make the — p(X))-term small 

enough. 

Let us now turn our attention to the log empirical likelihood ratio 

1l n (9 ) = 2£ n (9 )-2£ n (9 n ). 

Let I r be the r-dimensional identity matrix. The next theorem shows that 
the log empirical likelihood ratio converges to a linear combination of inde- 
pendent chi-square distributions. 
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Theorem 3. Under the conditions given in the Appendix, as n — > oo 
and k — ► oo, 

n n (e )^Q T nQ, 

where Q~JV(0,J r ) and = T^t^E^ )VE{% ) T f" 1 r 1 /2. 
When there are no missing values, V = V = E(gg T ) and 

-^(I^Wir^g)]-' 
XB (|)W)-^, 

which is symmetric and idempotent with tr(fi) = p. This means that 

which is the nonparametric version of Wilks theorem established in Qin and 
Lawless [23]. 

When there are missing values, Wilks theorem for empirical likelihood is 
no longer available due to a mismatch between the variance of the quan- 
tity n~ 1/2 Er=i3i(#o) and the probability limit of n" 1 £? =1 gi(9 )gl(9 ). 
This phenomenon also appears when a nuisance parameter is replaced by 
a plugged-in estimator as revealed by Hjort, McKeague and Van Keilegom 
[13]. 

When 9 = EY, TZ n (9 ) £ {Vi(6 ) /V 2 (6 )}xl where 

Vi(0 o ) = E{a 2 (X)/p(X)} + Var{m(X)} 

and V 2 (9 ) = E{a 2 (X)p(X)} + Var{m(X)}. This is the limiting distribution 
given in [33]. 

As confidence regions can be readily transformed to test statistics for test- 
ing a hypothesis regarding 9, we shall focus on confidence regions. There are 
potentially several methods for constructing confidence regions for 9. One 
is based on an estimation of the covariance matrix £ and the asymptotic 
normality given in Theorem 2. Another method is to estimate the matrix SI 
in Theorem 3 and then use Fourier inversion or a Monte Carlo method to 
simulate the distribution of the linear combinations of chi-squares. Despite 
the loss of Wilks theorem, confidence regions based on the empirical like- 
lihood ratio R n (9) still enjoy the attractions of likelihood-based confidence 
regions in terms of having natural shape and orientation and respecting the 
range of 9. 

We propose the following bootstrap procedure to approximate the dis- 
tribution of R n (9o). Bootstrap for imputed survey data has been discussed 
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in Shao and Sitter [30] in the context of ratio and regression imputations. 
We use the following bootstrap procedure in which the bootstrap data set 
is imputed in the same way as the original data set: 

1. Draw a simple random sample x n with replacement from the extended 
sample x n = {pQ, Yi) T for each Si = 1 and (Xi, {Yi u }^ =1 ) T for each 5i = 0; i = 
l,...,n}. 

2. Let Xnc be the portion of x n without imputed values and Xnm be the 
set of vectors in the bootstrap sample with imputed values. Then replace all 
the imputed Y values in Xnm using the proposed imputation method where 
the estimation of the conditional distribution is based on Xnc- 

3. Let I* (0 n ) be the empirical likelihood ratio based on the reimputed data 
set Xn > @n be the corresponding maximum empirical likelihood estimator and 
n*0 n ) = 2t0 n )-2£*(§*). 

4. Repeat the above steps S-times for a large integer B and obtain B 
bootstrap values {TZl(9 n )}^ =1 . 

Let be the 1 — a sample quantile based on {lZl(0 n )}b=i- Then, an 
empirical likelihood confidence region with nominal coverage level 1 — a is 
la = {0 \ R(0) < 1 a }- The following theorem justifies that this confidence 
region has correct asymptotic coverage. 

Theorem 4. Suppose the conditions given in the Appendix are satisfied 
and Q ~ N(0,I r ). Then, the conditional distribution of 1Z*(8 n ) given the 
original sample x n converges to the distribution of Q T £IQ in probability as 
n — > oo and k — > oo . 

5. Simulation results. We report results from two simulation studies in 
this section. In each study, the proposed empirical likelihood inference based 
on the proposed nonparametric imputation is compared with the empirical 
likelihood inference based on (1) the complete observations only by ignor- 
ing data with missing values and (2) the full observations since the missing 
values are known in a simulation. When there is a selection bias in the miss- 
ingness, the complete observations-based estimator may not be consistent. 
The proposed imputation will remove the selection bias in the missingness 
and improve estimation efficiency due to utilizing more data information. 
Obtaining the full observations-based estimator allows us to gauge how far 
away the proposed imputation based estimator is from the ideal case. 

We also compare the proposed method with a version of the inverse prob- 
ability weighted generalized method of moments (IPW-GMM) described in 
[6], in which the estimating functions involving complete observations are 
inflated by nonpar ametrically estimated propensity scores. Based on the 
usual formulation of the generalized method of moments (GMM) [11], the 
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weighted-GMM estimator for 9q considered in our simulation is 

r i » i 1 T f i n i 1 

where n c is the number of complete observations, At is a nonnegative def- 
inite weighting matrix and p{Xi) is a consistent estimator for p{Xi). The 
difference between the weighted-GMM method we use and that of [6] is 
that we used a kernel-based estimator for p(Xi), instead of the sieve esti- 
mator described in [6]. The bandwidth used to construct p{Xi) is obtained 
by the cross-validation method. The kernel function W(-) is taken to be the 
Gaussian and product Gaussian kernels, respectively, for the two simula- 
tion studies. Cross-validation method is also used to choose the smoothing 
bandwidth in the kernel estimator F(y\X) given in (1) for the proposed non- 
parametric imputation; see [4] for details. Simulation results not reported 
here confirm that our proposed method is not sensitive to the choice of band- 
width. To satisfy the requirement y/nh 2 — ► 0, we use half of the bandwidth 
produced by the cross-validation procedure. This is only a rule of thumb. Al- 
ternatively, we could use the bandwidth obtained from the cross-validation 
with a higher order kernel. That would prescribe a bandwidth satisfying the 
condition asymptotically. 

5.1. Correlation coefficient. In the first simulation, the parameter 6 is 
the correlation coefficient between two random variables X and Y where X is 
always observed, but Y is subject to missingness. We first generate bivariate 
random vector (Xj, Ui) T from a skewed bivariate t-distribution suggested in 
[2] with five degrees of freedom, mean (0,0) r , shape parameter (4, l) r and 
dispersion matrix 



ft 



1 0.955 
0.955 1 



Then we let Yi = Ui — 1.2XiI(Xi < 0). These make (Xi,Yi) T have mean 
(0, 0.304) and correlation coefficient 0.676. 
We consider three missing mechanisms: 

(a) p(x) = (0.3 + 0.175|a?|)J(|x| < 4) + 7(|s| > 4); 

(b) p(x) = 0.65 for all x; 

(c) p(x) = 0.5I(x > 0) + I(x < 0). 

The mechanism (b) is missing completely at random, whereas the other two 
are missing at random and prescribe selection bias in the missingness. 

Let fi x and \i y be the means, and a 2 and a 2 be the variances of X and 
Y, respectively. In the construction of the empirical likelihood for 6, the 
correlation coefficient, A = (fi x , fi y ,a 2 ,ay) T are treated as nuisance parame- 
ters. When all observations are complete (no missing data), the estimating 
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equation can be written as n 1 Ya=i 9{Xi,Yi, 6, A) = with 

/ Xi-fx x \ 



g(Xi,Yi,0,\) 



(Xi - fi x ) 2 - a 2 
(Yi - fiy) 2 - a 2 y 

\(Xi - H X )(Yi - fly) - 60 X Gy) 



Table 1 contains the bias and standard deviation of the four estimators 
considered based on 1000 simulations with the sample size n = 100 and 200, 
respectively. It also contains the empirical likelihood confidence intervals 
using the full observations, complete observations only and the proposed 
nonparametric imputation method at a nominal level of 95%. They are all 
based on the proposed bootstrap calibration method with B = 1000. When 
using the nonparametric imputation method, k = 20 imputations were made 
for each missing YJ. The confidence intervals based on the weighted-GMM 
are calibrated using the asymptotic normal approximation with the covari- 
ance matrix estimated by the kernel method. 

The results in Table 1 can be summarized as follows. The proposed em- 
pirical likelihood estimator based on the nonparametric imputation method 
significantly reduced the bias compared to inference based only on complete 
observations when the data were missing at random but not missing com- 
pletely at random. The estimator based on the completely observed data 
suffered severe bias under missing mechanisms (a) and (c). The proposed 
estimator had smaller standard deviations than the complete observation- 
based estimator under all three missing mechanisms, including the case of 
missing completely at random. The weighted-GMM method also performed 
better than the complete observation-based estimator. However, it had larger 
variance than the proposed estimator. Most strikingly, the standard devia- 
tions of the empirical likelihood estimator based on the proposed imputation 
method were all quite close to the full observation-based estimator, which 
confirmed its good theoretical properties. Confidence intervals based on the 
complete observations only and the weighted-GMM method could have se- 
vere under-coverage: the former is due to the selection bias and the latter 
is due to the normal approximation. The proposed confidence intervals had 
satisfactory coverages, which are quite close to the nominal level 0.95. 

5.2. Generalized linear models with missing covariates. In the second 
simulation study we consider missing covariates in a generalized linear model 
(GLM). We also take the opportunity to discuss an extension of the pro- 
posed imputation procedure to binary random variables. Commonly used 
methods in dealing with missing data for GLM are reviewed in [14]. Empir- 
ical likelihood for GLMs with no missing data was first studied by Kolaczyk 
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Table 1 

Inference for the correlation coefficient with missing values. The four methods considered 
are empirical likelihood using full observations, empirical likelihood using only complete 
observations (Complete obs.), inverse probability weighting based generalized method of 
moments (Weighted-GMM) , and empirical likelihood using the proposed nonparametric 
imputation (N. imputation). The nominal coverage probability of the confidence interval 

is 0.95 



Methods Bias Std. dev. MSE Coverage Length of CI 

n= 100 



Full observations 


— 


,0026 


0.0895 


0.0080 





936 





,3555 








Missing mechanism (a) 










Complete obs. 





.0562 


0.1222 


0.0181 





851 





,4967 


Weighted-GMM 





,0108 


0.1112 


0.0125 


0. 


776 





,2495 


N. imputation 


— 


,0092 


0.1041 


0.0109 


0. 


945 





,4875 








Missing mechanism (b) 










Complete obs. 


-0 


,0080 


0.1162 


0.0136 


0. 


930 





,4482 


Weighted-GMM 


-0 


,0150 


0.1069 


0.0117 


0. 


802 





2763 


N. imputation 


-0 


,0138 


0.0999 


0.0101 





932 





,4173 








Missing mechanism (c) 










Complete obs. 


-0 


.1085 


0.1442 


0.0326 





832 





5593 


Weighted-GMM 


-0 


,0266 


0.1167 


0.0143 





786 





,2860 


N. imputation 


-0 


,0383 


0.1053 


0.0125 


0. 


928 





,4322 








n = 200 












Full observations 





,0071 


0.0610 


0.0038 


0. 


958 





,2484 








Missing mechanism (a) 










Complete obs. 





,0710 


0.0776 


0.0111 


0. 


824 





3161 


Weighted-GMM 





,0112 


0.0734 


0.0055 





799 





,2060 


N. imputation 





,0038 


0.0709 


0.0050 


0. 


955 





3180 








Missing mechanism (b) 










Complete obs. 


-0 


,0030 


0.0799 


0.0064 


0. 


937 





3091 


Weighted-GMM 


-0 


,0031 


0.0719 


0.0052 


0. 


832 





,2075 


N. imputation 


-0 


,0023 


0.0668 


0.0045 





942 





2797 








Missing mechanism (c) 










Complete obs. 


-0 


,0915 


0.0979 


0.0179 





,788 





3919 


Weighted-GMM 


-0 


.0107 


0.0745 


0.0057 





820 





,2131 


N. imputation 


-0 


.0118 


0.0680 


0.0048 





936 





,2860 



[16]. Application of empirical likelihood method to GLMs can help overcome 
difficulties with parametric likelihood, especially in the aspect of overdisper- 
sion. 

To demonstrate how to extend the proposed method to discrete compo- 
nent in Xi, we consider a logistic regression model with binary response 
variable X% and covariates X±, X2 and Y. We choose logit{P(A A 3j = 1)} = 
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-l + X li + X 2i -1.5Y i , X U ~N(0,0.5 2 ), X 2i ~iV(3,0.5 2 ) an d Y i being bi- 
nary with logit{P(Yi = 1)} = -1 + Xu + 0.5X 2i . Here Xu, X 2i and X 3i 
are always observable while the binary Yi is subject to missingness with 
logit {P(Yj is missing)} = 0.5 + 2Xu + X 2 i — 3X 3 i. This model with d x = 3 
also allows us to see if there is a presence of the curse of dimension due to 
the use of the kernel estimator in the proposed imputation procedure. 

When no missing data are involved, the empirical likelihood analysis for 
the logistic model simply involves the estimating equations 

n 

n~ l Y / S t {X 3i -7r(SlP)} = 
i=i 

with Si = (1, Xu, X 2 i, Yi) T , (3 being the parameter and ir(z) = exp(z)/{l + 
exp(z)}. Although our proposed imputation in Section 2 is formulated di- 
rectly for continuous random variables, binary response X 3 i can be accom- 
modated by splitting the data into two parts according to the value of X 3 i , 
and then applying the proposed imputation scheme to each part by smooth- 
ing on the continuous Xu and X 2 i- The maximum empirical likelihood es- 
timator for (3 uses a modified version of the fitting procedure described in 
Chapter 2 of [18]. 

The results of the simulation study with n = 150 and 250 are shown 
in Table 2. Despite that the dimension of Xj is increased to 3, the stan- 
dard deviations of the proposed estimator were still quite close to the full 
observation-based empirical likelihood estimator, which was very encourag- 
ing. For parameters (3o , (3± and (3 2 , the mean squared error of the proposed 
estimator is several folds smaller than that based on the complete obser- 
vations only; the proposed method also leads to a reduction in the mean 
squared error by as much as one fold relative to the weighted-GMM. All 
three methods give similar mean squared errors for the parameter (3 3 , while 
the proposed estimator had the smallest mean squared error. The confidence 
intervals based on only complete observations or the weighted-GMM tend 
to show notable undercoverage, while the proposed confidence intervals have 
satisfactory coverage levels for all parameters. 

6. Empirical study. Microarray technology provides a powerful tool in 
molecular biology by measuring the expression level of thousands of genes si- 
multaneously. One problem of interest is to test whether the expression level 
of genes is related to a traditional trait like body weight, food consumption 
or bone density. This is usually the first step in uncovering roles that a gene 
plays in important pathways. The BXD recombinant inbred strains of mice 
were derived from crosses between C57BL/6J (B6 or B) and DBA/2J (D2 
or D) strains [35]. Around 100 BXD strains have been established by re- 
searchers at University of Tennessee and the Jackson Laboratory. A variety 
of phenotype data are accumulated for a BXD mouse over the years [22]. 
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Table 2 

Inference for parameters in a logistic regression model with missing values. The four 
methods considered are empirical likelihood using full observations (Full obs.), empirical 
likelihood using only complete observations ( Complete obs.), inverse probability weighting 
based generalized method of moments (Weighted-GMM), and empirical likelihood using 
the proposed nonparametric imputation (N. imputation). The nominal coverage 
probability of the confidence interval is 0.95 



Methods Bias Std. dev. MSE Coverage Length of CI 



n= 150 







#> = - 


-1 






Full obs. 


-0.0296 


1.292 


1.669 


0.964 


5.477 


Complete obs. 


-1.715 


1.618 


5.559 


0.920 


6.840 


Weighted-GMM 


-0.7835 


1.562 


3.053 


0.891 


5.250 


N. imputation 


0.0349 


1.317 


1.736 


0.967 


5.549 






Px = 


1 






Full obs. 


0.0519 


0.4384 


0.1949 


0.964 


1.820 


Complete obs. 


0.7898 


0.5603 


0.9377 


0.796 


2.510 


Weighted-GMM 


0.4302 


0.5486 


0.4860 


0.834 


1.811 


N. imputation 


-0.0605 


0.4388 


0.1962 


0.961 


1.851 






/& = 


1 






Full Obs. 


0.0367 


0.4500 


0.2038 


0.972 


2.007 


Complete obs. 


0.4205 


0.5590 


0.4892 


0.945 


2.599 


Weighted-GMM 


0.2542 


0.5484 


0.3653 


0.896 


1.791 


N. imputation 


-0.0110 


0.4576 


0.2095 


0.966 


1.993 








1.5 






Full obs. 


-0.0531 


0.4979 


0.2507 


0.976 


2.137 


Complete obs. 


-0.0684 


0.5713 


0.3310 


0.975 


2.592 


Weighted-GMM 


-0.0751 


0.5843 


0.3471 


0.838 


1.574 


N. imputation 


0.0718 


0.5521 


0.3100 


0.966 


2.474 



The trait that we consider is the fresh eye weight measured on 83 BXD 
strains by Zhai, Lu and Williams (ID 10799, BXD phenotype data base). 
The Hamilton Eye Institute Mouse Eye M430v2 RMA Data Set contains 
measures of expression in the eye on 39,000 transcripts. It is of interest to 
test whether the fresh eye weight is related to the expression level of certain 
genes. However, the microarray data are only available for 45 out of the 83 
BXD mouse strains for which fresh eye weights are all available. The most 
common practice is to use only complete observations and ignore missing 
values in the statistical inference. As demonstrated in our simulation, this 
approach can lead to inconsistent parameter estimators if there is a selection 
bias in the missingness. Even in the absence of selection bias, the estimators 
are not efficient as only those complete observations are used. 
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Table 2 
( Continued) 



Methods Bias Std. dev. MSE Coverage Length of CI 



n = 250 
po = -l 



Full obs. 


-0.0286 


0.9651 




0.9321 


0.956 


3.916 


Complete obs. 


-1.670 


1.212 




4.255 


0.801 


4.790 


Weighted-GMM 


-0.6393 


1.150 




1.7304 


0.862 


3.832 


N. imputation 


0.0284 


0.9801 




0.9615 


0.962 


3.963 






Pi 


= 1 








Full obs. 


0.0195 


0.3332 




0.1114 


0.953 


1.349 


Complete obs. 


0.7270 


0.4398 




0.7220 


0.665 


1.789 


Weighted-GMM 


0.3166 


0.4223 




0.2786 


0.782 


1.304 


N. imputation 


-0.0660 


0.3367 




0.1177 


0.947 


1.380 








= 1 








Full obs. 


0.0305 


0.3374 




0.1147 


0.958 


1.400 


Complete obs. 


0.3902 


0.4134 




0.3232 


0.867 


1.729 


Weighted-GMM 


0.1966 


0.3993 




0.1981 


0.874 


1.297 


N. imputation 


-0.0173 


0.3406 




0.1163 


0.967 


1.384 






Pa = 


-1.5 






Full obs. 


-0.0611 


0.3818 




0.1495 


0.950 


1.529 


Complete obs. 


-0.0351 


0.4445 




0.1988 


0.963 


1.797 


Weighted-GMM 


-0.0419 


0.4596 




0.2130 


0.791 


1.165 


N. imputation 


0.0762 


0.4377 




0.1974 


0.944 


1.759 



We conduct four separate simple linear regression analyses of the eye 
weight (x) on the expression level (y) of four genes, respectively. The esti- 
mating equation can be written as n _1 Y^"i=i 9(Xi, Yi, 0) = 0, where 

Xi - 0i - e 2 Yi \ 

X i Y i -9 1 Y i -6 2 Y?) 

and 9\ and 62 represent the intercept and slope, respectively. The genes 
are H3071E5, Slc26a8, Tex9 and Rpsl6. Here we have missing covariates in 
our analysis. The missing gene expression levels are imputed from a kernel 
estimator of the conditional distribution of the gene expression level given 
the fresh eye weight. The smoothing bandwidths were selected based on the 
cross-validation method, which is 1.5 for the first three genes in Table 3 and 
1.8 for gene Rpsl6. 

Table 3 reports empirical likelihood estimates of the intercept and slope 
parameters and their 95% confidence intervals based on the proposed non- 
parametric imputation and empirical likelihood. It also contains results from 
a conventional parametric regression analysis using only the complete obser- 
vations, assuming independent and identically normally distributed resid- 



g(X i ,Y i ,9) = 
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Table 3 

Parameter estimates and confidence intervals (shown in parentheses) based on a simple 
linear regression model using the parametric method with complete observations only and 
the empirical likelihood method using the proposed nonparametric imputation. For the 
parametric inference, the confidence intervals for the intercept and slope are obtained 
using quantiles of the t- distribution, and the confidence intervals for the correlation 
coefficient are obtained by Fisher's z transformation 



Complete observations only Nonparametric imputation 

Gene (parametric) (with empirical likelihood) 



Intercept 



H3071E5 


-21.99 


(-40.97, -2.998) 


15.69 


(-37.02, 5.209) 


Slc26a8 


73.59 


(49.45, 97.73) 


67.28 


(38.34, 95.87) 


Tex9 


-23.81 


(-46.12, -1.507) 


14.66 


(-38.57, 8.776) 


Rpsl6 


-13.52 


(-31.08, 4.041) 


-8.090 


(-26.76, 10.18) 






Slope 






H3071E5 


10.16 


(5.720, 14.59) 


8.736 


(2.688, 14.21) 


Slc26a8 


-6.352 


(-9.294, -3.411) 


-5.561 


(-9.431, -1.471) 


Tex9 


5.101 


(2.588, 7.613) 


4.094 


(0.8753, 6.979) 


Rpsl6 


6.766 


(3.371, 10.16) 


5.754 


(1.948, 9.236) 






Correlation coefficient 




H3071E5 


0.5757 


(0.3395, 0.7436) 


0.4426 


(0.1321, 0.6977) 


Slc26a8 


-0.5533 


(-0.7285, -0.3102) 


-0.4319 


(-0.6809, -0.0761) 


Tex9 


0.5296 


(0.2996, 0.7124) 


0.4024 


(0.1013, 0.6846) 


Rpsl6 


0.5256 


(0.2744, 0.7097) 


0.4151 


(0.0755, 0.6613) 



uals. Table 3 shows that these two inference methods can produce quite 
different parameter estimates and confidence intervals. The difference in pa- 
rameter estimates is as large as 50% for the intercept and 25% for the slope 
parameter. Table 3 also reports estimates and confidence intervals of the 
correlation coefficients using the proposed method and Fisher's z transfor- 
mation. The latter is based on the complete observations only and is the 
method used by genenetwork.org. We observe again differences between the 
two methods despite not being significant at the 5% level. The largest differ- 
ence of about 30% is registered at gene H3071E5. As indicated earlier, part 
of the differences may be the estimation bias of the complete observations- 
based estimators as they are unable to filter out selection bias in the miss- 
ingness. 

APPENDIX 

Let f(x) be the probability density function of X and 
m g (x)=E{g(X,Y,e )\X = x}. 
The following conditions are needed in the proofs of the theorems. 
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CI: The missing propensity function p(x), the X-density f(x) and m g (x) 
all have bounded partial derivatives with respect to x up to an order q with 
q > 2 and 2q > d x , and ini x p(x) > cq for some cq > 0. 

C2: The true parameter value 9q is the unique solution to E{g(Z,9) = 0}; 
the estimating function g(x,y,9o) has bounded qth. order partial derivatives 
with respect to x, and E\\g(Z, 9q)\\ 4 < oo. 

C3: The second partial derivative d 2 g(z,9)/d9d9 T is continuous in 9 in 
a neighborhood of the true value #0; functions \\dg(z,9)/d9\\, \\g(z,9)\\ 3 and 
\\d 2 g(z, 9)/d9d9 T \\ are all uniformly bounded by some integrable function 
M(z) in the neighborhood of 9q. 

C4: The matrices T and V defined in (7) and (8) are, respectively, positive 
definite and E[dg(z,9)/d9] has full column rank p. 

C5: The smoothing bandwidth h satisfies nh dx — ► 00 and yfnh q — ► as 
n — > 00. Here q is the order of the kernel K. 

Assuming p(x) being bounded away from zero in CI implies that data 
cannot be missing with probability 1 anywhere in the domain of the X 
variable. Conditions C2, C3 and C4 are standard assumption for empirical 
likelihood-based inference with estimating equations. In condition C5, that 
^fnh q — > is to control the bias induced by the kernel smoothing, whereas 
that nh dx — > 00 leads to consistent estimation of the conditional distribution. 
To simplify the exposition and without loss of generality, we will only deal 
with the case that q = 2 in the proof. 

Lemma A.l. Assume that conditions C1-C5 are satisfied, then as n — ► 
00 and k — ► 00, 

n 

^ 1/2 E^(^o)-^(o,r), 

i=l 

where T = E{p- 1 {X)Var(g\X) + E(g\X)E{g T \X)}. 

For the proof of Lemma A.l, we need the following proposition, which is 
a direct consequence of Lemma 1 in [29] . 

Proposition A.l. Let {V{\ be a sequence of random variables such 

that, for some function h, as n — ► 00, h(Vi,...,V n ) — ► H, where S has a 
distribution function G. If {Ui} is a sequence of random variables such that 

P{U n - h(Vi,.. .,V n )<s\Vi,...,V n }-> F(s) 

almost surely for all s £ R, where F is a continuous distribution function, 
then 

P(U n <t)^(G*F)(t) 
for all t S R, where "*" denotes the convolution operator. 
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Proof of Lemma A.l. Let u£W and ||u|| = 1. Also let g u (Z,9 ) = 
u T g(Z,8o) and g u i(6o) = uT Si(^o)- First we need to show that 



-1/2 ' 

i=l 

and then use the Cramer- Wold device to prove Lemma A.l. Define 
m gu (x) = E(g u (X,Y,e )\X = x) 

and 

mgu{x} EtiW(z-^)A) 

Now we have 

1 n { K 1 

-YX k9u{Xi,YiM + (i - Si)*' 1 Y.9u{x u Y iv ,e G ) 

U i=l { v=l J 

1 n 

= - z Z S i{Su(X i ,Y i ,e )-m gn (X i )} 

n 1=1 

1 n { K 1 

+ - £(1 - %){ « £ 9u(Xi,Y iu , 9 ) - m gu (Xi) 

n i=l I v=l ) 

Y n 1 n 

+ -£(!" S *){™9u - m gu (X t )} + - £ m gu (X,) 

li . Ti . 

1 = 1 1=1 

'■ = S n + A n + T" n , + R n . 

Note that and R n are sums of independent and identically distributed 
random variables. Define rj(x) = p{x)f{x) and fj(x) = - YJj=i SjWh(Xj — x) 
as its kernel estimator, where Wh{u) = h~ dx W{u/h). Then, 

T n = - 2^(1 - *i) " 7^ 

" ~i v(Xi) 

+ i £(1 - (X) - 

+ I y (i _ 5, ) f 5 o W ^ X i - X *)( m 9u (Xj) - m gu (X,)) | 

nfr[ 1 I ??(Ai) J 

: = T n i + T n ,2 + T„3 . 

We now derive the asymptotic distribution of T n \. Note that, by exchanging 
the summation operator, 

„ If, , v (lMEUSjWhjXj - XJiguiX^Yj^o) - m gu (X,-)} 

J-ni = - 2_^\ l ~ °i) m 
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(l-SjWhiXi-XA 



ii- 



ii' 



J2J2 6 d9u(X l ,Y J ,6)-m gu (X 3 )}- 



j=l i=l 



j=n=i 



v{Xi) 



IJ- 



say. 



Define 



j=i i=i 



and write T n \ = T n \ + (T n \ — T n \ ) . The following derivation will show that 
T n \ is dominated by T n %, while (T n \ — T n \) is of smaller order. We note by 
ignoring terms of O p (h 2 ), which are o p (n~ 1 / 2 ) under the assumption that 
^/nh 2 -> 0, 



'-nl 



n 



n 



3=1 



{g u (Xi ,Y j ,9 )-m gu (Xj ) } 



(1 - Si)W h (Xi - Xj) 



n 



Y^ 5 3 E X i \X h Yj 
3=1 



3=1 



E 



(1 - 5 l )W h (X l - X 3 



X 3> Y 3 



v(Xi) 

{g u (Xi,Yj,6 ) -rriguiXj)} 
(l-PiX^WhiX-Xj) 



Xj,Yj,X; 



T)(Xi) 



where Exi\Xj,Yj{') represents conditional expectation on Xi given (Xj,Yj). 
Then, 



1 

f n \ = - y] 5j 

3=1 



{g u (x,Yj,e ) - m gu (Xj)} 



{\-p{x)}W h {x-Xj) 
r){x) 



1 n r 

3=1 



n f-? J 

3=1 



{g u (x, Yj,e ) - mg^Xj)} ^ p ^ } W h (x - Xj) 



f(x)dx 



dx 



{g u (Xj + hs,Yj,eo) - m ^)} { \[x] X +^s) )} w{s) 



ds. 
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Since both g u and p{x) = {1 — p(x)}/p(x) have bounded second derivative 
on x, and yfnh? — ► as n — ► oo, a Taylor expansion around Xj leads to 

(A.l) f nX = -^5 j {g u (X ] ,Y ] ,9 )-m gu (X J )} 1 ~ P j X ' ) +o p {n~ 1 / 2 ). 
n j=i PV^j) 

Now we show T n \ — T n \ = o p {n~ 1 / 2 ). Let 



1 n 

TfiH — ^ ' Q 

and 



n " 

3=1 



1 

Tnii = — E{Qjj I (J¥j,ijf,5j)}. 

By straight forward derivations, 
n£(T nl - T nl ) 2 

1 n 2 
(A. 2) = — y E(T ri u — T n \i) H — }E{(T n ii-T n ii)(T n ik — T n ik)} 

i=i ij^k 

= E(T n li — T n ii) 2 . 

The last step used the fact that Ei^j{(T n u — T n ii)(T n ij — T n ij)} = 0, which 
can be shown by conditioning on the completely observed portion of data. 
Thus, 



nE(T nl -f nl f 



EJ-nli ~ El nli 



< ET nli 



<E 



0, 



by a standard derivation in kernel estimation. This suggests that T n \ = T n \ + 
o p (n -1 / 2 ). By a standard argument, it may be shown that T n 2 = o p (n^ 1 / 2 ). 
For T n 3, a similar derivation to that for T n \ shows that T n ^ = o p (n~ 1 / 2 ). 
Thus, 

(A.3) V^T n ^ N[0,E{(1 - p(X)) 2 a 2 gu (X)/p(X)}}, 

where <7 2 u (X) = Yav{g u (X, Y, 6 ) | X}. 
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Note that y/nS n N[0, E{p(X)a 2 gu (X)}} and ^/ER n N[0, Xar{m gu (X)}] 
by the central limit theorem. Furthermore, it can be shown that 

nCov(S n , T n ) =E{{1- p(X))a 2 gu (X)} + o(l), 

nCov(R n , S n ) = and nCov(R n ,T n ) = o(l). It readily follows that 



\fn I T n 



N[0, 



T 

Var(m 9n (X)) 



where 



T 



E{p(X)al(X)} 
E{{l-p{X))a 2 (X)} E{(l-p(X)) 2 a 2 (X)/p(X)} 



E{(1 - p(X))al(X)} 



Hence, we have 

(A.4) y^(5 n + T n + R n ) 4 JV[0, (X)/p(X)} + Var{m, u (X)}]. 

Now we consider the asymptotic distribution of 

1 n { K 1 



i=l 



Given all the original observations, ra -1 / 2 (l — <5,;){k -1 X^=i 9u(Xi, Yi u , 9) — 
fh(Xi)}, i = 1, 2, . . . , n, are independent with conditional mean zero and con- 
ditional variance — #i){7 gu (JQ) — rh% u (Xi)}. Here 

n 

is a kernel estimator of J gu {x) = E{g 2 (X, Y,9q)\X = x}. By verifying Lya- 
pounov's condition, we can show that conditioning on the original obser- 
vations, ^/nA n has an asymptotic normal distribution with mean zero and 
variance (uk) -1 Ya=i(^ ~ ~ ^SLP^)}- The conditional variance 

n 

(A.5) M- 1 ^ - W fc (X0 - A -p{X)}al(X)]. 

By Proposition A.l, we can show that, as n — > oo and k — > oo, ^/n(S n + T n + 
R n + A n ) converges to a normal distribution with mean and variance 

Var{m ff[i (Z, ^ )} + ^{P" 1 (A)< (X)} = U T r U . 

Hence, n -1 / 2 £™ =1 <7 u (Xj, 6> ) iV(0, u T lV). And Lemma A.l is proved by 
using the Cramer-Wold device. □ 
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Lemma A. 2. Under the conditions CI— C5, as n — > oo and k — > oo, 

1 n 

where f = E{p(X)V&r(g\X) + E{g\X)E{g T \X)} . 

PROOF. Consider the (j, fc)th element of the matrix ^ Ya=i 9i(&o)9ii^o), 
that is, 

1 n 

ii. 

i=i 

where g^^Oo) and gi(k){@o) represent the jth and fcth element of the vector 
gi(6o), respectively, for < j, k < r. Similarly, we use gu\ to represent the 
jth element of g. Note that 

1 n 

n i=l 

1 n 



n . 



n i=l { v=\ J I z/=l J 

Moreover, 

1 - 

finl = -22^i{9(j)(Zi,0o) - m gu) (Xi)}{g ik) (Zi,eo) -m fl(fc) (Xj)} 
n t=i 

i=l i=l 
1 " 

+ - ^2 s i9{k)(Zi,0o)m gu) (Xi) 

: = B n \ a + 5 n ib + B n \c + -B n ld. 

It is obvious that B n \ a , B^, B n \ c and B n id are all sums of independent 
and identically distributed random variables. By law of large numbers and 
the continuous mapping theorem, we can show that 

B nl ZE\p(X) Cov{g U} (Z, 6 ) , g (k) (Z, ) \X} + p(X)m 9u (X)m g(k) (X)} , 
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where 

Cov{g 0) (Z,9 ),g {k) (Z,9a)\X} 

= E[{g u) (Z,9 )-m gu) (X l )}{g (k) (Z,e ) - m gw (Xi)}\X\. 

Note that 



1 

^n2 = -E( 1 -^) 

1=1 



u=l ) K v=l 



k- 1 J2 9(j) (XuYiv, 9 ) \ \ K' 1 £ 5(fe) (Xi,Y iu , 9 ) j 

m fl0) (Xi)m fl(fc) (Xi) 
^ 5^(1 - 5i){m gu) (Xi)m g(k) (Xi) - m 9{j) (X l )m g(k) {Xi)} 



1 n 



1 n 



+ -XX 1 - ^) m sy ) ( JS£: i) m S(fc)( X i) 



n . 



: — B n 2a + B rl 2b + B n 2c- 

As k _1 J2u=i 9(j)(Xi, Y{ u , 9o) has conditional mean rh g{j) (Xi) given the orig- 
inal observations % n , it can be shown that B n 2 a as k — > oo. By ar- 
gument similar to those used for (A. 3), B rl 2b as n — > oo. Obviously 
S n 2c is the sum of independent and identically distributed random vari- 
ables, which leads to B n 2 C E[{1 — p(X)}m gu) (Aj)m 9(fe) (Aj)]. Hence, we 

have B n 2 A ^[{1 — p(X)}m g ^ ) (Aj)m 9{fc) (Xi)] as n — > oo and k — > oo. There- 
fore, 

S nl + B n2 ^S[p(X)Cov{ 5(i) (Z ! o ),5(fe)(^^o)|^} + m g( . ) (X)m g(fe) (X)]. 
This completes the proof of Lemma A. 2. □ 

Proof of Theorem 1. The proof of Theorem 1 is very similar to that 
of Lemma 1 of Qin and Lawless [23] . Briefly, we can show 

= 0(n- 1 / 3 ) 

almost surely uniformly for all 9 such that \\9 — 9q\\ < Cn^ 1 ^ for a positive 
constant C. 

From this and Taylor expansion, we can show £ n (9) = 0(n 1 / 3 ) and £ n (9o) = 
O(loglogn) almost surely. Noting that £(9) is a continuous function about 
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as belongs to the ball \\0 — 0q\\ < Cn 1 / 3 , with probability tending to 1, 
£ n (0) has a minimum 8 n in the interior of the ball, and this 8 n satisfies 



t n (0) 



08 



E 



{dt T {8)/d8}~g t {0) + {dgi(6)/deyt(6) 



y — 



+ F(0)g i (0){ 80 



0. 



Hence, the n satisfies the second equation of (6). Prom the algorithm of 

the empirical likelihood formulation, the n automatically satisfies the first 
equation of (6). This completes the proof of Theorem 1. □ 

Proof of Theorem 2. Recall that n and i=t{0 n ) satisfy 

Qln{0n,t) = 0, Q2n{0n,t)=O. 

Taking the derivatives with regard to and t T , 

dQm(0,O) 1 ^ dgj(9) dQ ln (0,O) 1 

n*—^ 00 0t T n t-r 1 



00 

dQ 2n (0,O) 



0. 



OQ 2 n(0,0) _ 1 y[%Wl T 
00 



00 OF 
Expanding Qi n (0 n ,t) and Q2n(0n,i) at (0o>O), we have 
O = Qm(0n,i) 

Qlni.00,0) H ^ [0 n — 0o) H ^ (t-0) + Op(( n ), 



— Q2n(0n,t) 
= Q2n(0O,O) + 



08 



9Q2n(g ,0) | 

08 



OV 

dQ2n(0Q,O) 

8t T 



(f-0) + Op(C„), 



where Cn = \\8n — #o|| + \\t\\- Then, we have 



t 



-<9ln(^0,0) + O p (Cn) 
Op(Cn) 



where 



( 9Qin OQin 

0t T 00 

dQ2n „ 



Sil 5i2 

S21 



E 



(0o,o) 



\ E {O0 



(dg\\ 
° J 
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Here dQi n /dt T \rg o0 \ Sn follows from Lemma A. 2, and dQi n /d9\^ o0 ^ 
S\2 can be derived by arguments similar to those used for the proof of 
Lemma A.l. Note that Qi n (9 ,0) = -ELi&^o) = Op(^" 1/2 ), it follows 
that Cn = O p (n _1/2 ). After some matrix manipulation, we have 

V^(0n - 60) = S 22 1 1 5 2 i5r 1 1 v ^Qi„.(^o, 0) + op(l), 

where V = = {E{% ) T f- 1 E{% J}" 1 . By Lemma A.l, ^Qm(0 ,0) 4 
iV(0,r), and the theorem follows. □ 

Proof of Theorem 3. Notice that 

K(0 ) = 2 J>g{l + ^(0 O )} - E 1 ^ 1 + i T H0n)} 

. i i 

where to = t(9o), and 

I0 n ,t) = +t T 9i0n)} = ~Q T m(Oo,0)AQ ln (e ,0) + o p (l), 

i 

where A = S^(I + S^S^i S21 S^). Under H , 

J itfo) =0, t = --S , ri 1 Qln(0O,O)5r i 1 Qln(0O,O) + O p {\) 

n i l + t o9i{^0) 

and E 4 log{l + ^(^)} = -§QI„(0o,O)5r 1 1 Q ln (eo,O) + Op (l). Thus, 

ft(0 o ) =nQI n (0 o ,O)(^-5r 1 1 )Qin(^o,O) +o p (l) 

= V^QUOo, 0)S u 1 S 12 S2 2 \S 21 S u 1 ^Qm(0 , 0) + o p (l). 

Note that 

and, by Lemma A.l, ^/nQl n (9o, 0) — > iV(0, T). This implies the theorem. □ 

Proof of Theorem 4. The proof of Theorem 4 essentially involves 
establishing the bootstrap version of Lemma A.l to Theorem 3. We only 
outline the main steps in proving the bootstrap version of Lemma A.l here. 

Let X*, Y*, Y* u , 8*, g* ui be counter parts of X i: Y i: Y iv , Si, g ui in the 
bootstrap sample; and S n (9 n ), A n (9 n ), T n (9 n ) and R n (9 n ) be the quanti- 
ties S n , A n , T n and R n with 9q replaced by 9 n , respectively. Furthermore, 
let S*(9 n ), An(9 n ), T*(9 n ) and i?*(0 n ) be their bootstrap counterparts. 
First, we will consider the conditional distribution of y/n{S*(9 n ) + T*(9 n ) + 
R*n{6n) - S n (9 n ) - T n (9 n ) - R n (9 n )} given the original data. We use £*(■) 
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and Var*(-) to represent the conditional expectation and variance given the 
original data, respectively. Define 

- ( a \ — $iW((x — Xi)/h)g u (x, Yi, 9 n ) 



E?=iW(0r-^)M) 

and 

E? = iS*W((x- Xi)/h)g u (x,Y?,§, 



J^ =1 5*W((x-X*)/h) 



^ Note that S n (9 n ) + T n (9 n )+R n (9 n ) = ^E?=i{^9u(Z l ,9 n ) + (l-5 i )rh gu (X l 
n )}. Thus, 



S n {0 n ) + T*{9 n ) + R n (9 n ) — S n {6 n ) — T n (6 n ) — R n (9 n 

1 

n 



J2[8!g u (Z*,§ n ) + (1 - 5*)m gu (X*X) 



i=l 

- E*{5*g u (Z*,9 n ) + (1 - 6*)m gu (X*A)}] 
1 n 

+ - EC 1 - A) - m gu (x*,e n )} 

i=l 

:=B l +B 2 . 

It can be shown that B 2 = o p (n~ 1 / 2 ). For B\, we can apply the central limit 
theorem for bootstrap samples, for example, [31] to show that the conditional 
distribution of ^fnB\ given x n 1S asymptotically normal with mean zero and 
variance \ai^{5*g u {Z* ,9 n ) + (1 - 5*)m gu (X* ,9 n )}. 

Using similar methods for Lemma A.l, we can also derive the conditional 
distribution of ^/nA^ l (9 n ) given the observations in the bootstrap sample 
that are not imputed. Then by employing Proposition A.l, it follows that 
the conditional distribution of ra -1 / 2 Ya=x 9ui0n) given x n ^ s asymptoti- 
cally normal with mean zero and variance <r 2 * = Var*{<5*g n (Z?, 9 n ) + (1 — 
5*)rh gu (X* ,9 n )}. The bootstrap version of Lemma A.l is justified by noting 
that <r 2 * converges in probability to u T Tu as n — ► oo, then employing the 
Cramer-Wold device. □ 
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